In compiler theory, loop interchange is the process of exchanging the order of two iteration variables.
For example, in the code fragment:
for i from 0 to 10 for j from 0 to 20 a[i,j] = i + j
loop interchange would result in:
for j from 0 to 20 for i from 0 to 10 a[i,j] = i + j
On occasion, such a transformation may lead to opportunities to further optimize, such as vectorization of the array assignments.
Contents |
One major purpose of loop interchange is to improve the cache performance for accessing array elements. Cache misses occur if the contiguously accessed array elements within the loop come from a different cache line. Loop interchange can help prevent this. The effectiveness of loop interchange depends on and must be considered in light of the cache model used by the underlying hardware and the array model used by the compiler.
In the C programming language, the array elements from the same row are stored consecutively (Ex: a[1,1],a[1,2], a[1,3]... ), namely row-major. On the other hand, FORTRAN programs store array elements from the same column together(Ex: a[1,1],a[2,1],a[3,1]...) , called column-major. Thus the order of two iteration variables in the first example is suitable for a C program while the second example is better for FORTRAN.[1] Optimizing compilers can detect the improper ordering by programmers and interchange the order to achieve better cache performance.
Like any other compiler optimization, loop interchange may lead to worse performance because cache performance is only part of the story. Take the following example:
do i = 1, 10000 do j = 1, 1000 a(i) = a(i) + b(j,i) * c(i) end do end do
Loop interchange on this example can improve the cache performance of accessing b(j,i), but it will ruin the reuse of a(i) and c(i) in the inner loop, as it introduces two extra loads (for a(i) and for c(i)) and one extra store (for a(i)) during each iteration. As a result, the overall performance may be degraded after loop interchange.
It is not always safe to exchange the iteration variables due to dependencies between statements for the order in which they must execute. To determine whether a compiler can safely interchange loops, dependence analysis is required.
Kennedy, Ken; & Allen, Randy. (2001). Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann. ISBN 1-55860-286-0.
[ Link Broken ]